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Abstract. 

This paper introduces the use of Principal Componcncnt Analysis as a method 
to decompose the catalogues of gravitational waveforms to produce a set of 
orthonormal basis vectors. We apply this method to a set of gravitational 
waveforms produced by rotating stellar core-collapse simulations and compare the 
basis vectors obtained with those obtained through Gram-Schmidt decomposition. 
We observe that, for the chosen set of waveforms, the performance of the two 
methods are comparable for minimal match requirements up to 0.9, with 14 Gram- 
Schmidt basis vectors and 12 principal components required for a minimal match of 
0.9. This implies that there are many common features in the chosen waveforms. 
Additionally, we observe the chosen waveforms have very similar features and a 
minimal match of 0.7 can be obtained by decomposing only waveforms generated 
from simulations with A=2. We discuss the implications of this observation and the 
advantages of eigen-decomposing waveform catalogues with Principal Component 
Analysis. 



1. Introduction 



The current global network of interferometric detectors (GEO 600 [T], LIGO [2], 
TAMA 300 [3] and Virgo [4]) have been scanning the sky for gravitational wave 
signals with unprecedented sensitivities. The prospect of detection is very real and 
once a detection is made, we must ask what can be inferred about the source from the 
detected gravitational wave signal. While this issue must be addressed for all signal 
types, we choose to focus on core-collapse supernovae here. Numerical relativity 
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simulations have predicted several sets of gravitational wave signals or waveform 
catalogues due to rotating core-collapse supernovae (see [5] and references therein). 
The features of each waveform produced by the simulations vary depending on the 
physics employed and the chosen initial parameters. The predicted waveforms can be 
used as inputs into parameter estimation algorithms which can, for example, calculate 
the likelihood that the detected signal corresponds to one of the predicted waveforms. 
However, from a cursory inspection of the waveform catalogues, one can see that there 
are many common features in the predicted waveforms, especially for waveforms in 
the same catalogue. By decomposing the waveforms into a set of orthonormal basis 
vectors, we can greatly reduce the computation costs of the parameter estimation 
stage by concentrating on a subset of basis vectors that encompass the main features 
of the chosen waveforms. 

We propose using Principal Component Analysis (PCA) to create an 
orthonormal set of basis vectors. Broadly speaking, PCA transforms a correlated, 
multi-dimensional data set into a set of orthogonal components. This is achieved by 
determining the eigenvectors and eigenvalues of the covariance matrix of the data 
set. The first principal component is the eigenvector with the largest corresponding 
eigenvalue. It is a the linear combination of the original variables which accounts 
for as much of the variability in the data as possible. Similarly, the second principal 
component is the linear combination which accounts for as much of the remaining 
variability as possible - subject to the constraint that it is orthogonal to the first 
principal component - and so on. In recent years, PCA has been applied to a number 
of astrophysical problems (see [6] for a recent example), such as spectral classification, 
photometric redshift determination and morphological analysis of galaxy redshift 
surveys, as well as wider class of image processing and pattern recognition problems 
across a range of scientific applications. For a detailed account of the statistical basis 
of PCA the reader is referred to, for example, Morrison (1967) [7] or Mardia et al. 

(1979) [g. 

It must be noted that this is not the first approach proposed to decompose 
a waveform catalogue. Brady and Ray-Majumder [9] have previously applied 
Gram-Schmidt decomposition to the Zwerger-Miiller [TOj and Ott et al. [T2] 
waveform catalogues. Additionally, Summerscales et al. [H] developed a Maximum 
Entropy based method to identify the presence of a gravitational wave signal and 
demonstrated the method's ability to extract the correct amplitude and phase of 
waveforms from a catalogue by Ott et al. (2004) [T2] . 
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In this article, we choose the waveform catalogue generated by simulations 
peformed by Dimmelmeier et al [I3] to demonstrate the use of PCA. These 
simulations focus on rotating stellar core-collapse and the subsequent bounce. While 
this phase of a supernova has been the focus of numerous simulations over the years, 
recent simulations by Burrows et al. have produced large post-bounce gravitational 
wave signals due to acoustic shock mechanisms [11]. There is still much discussion 
about this mechanism so we choose to concentrate only on the gravitational wave 
signal produced by the core-collapse phase. Moreoever, we would like to stress that 
the choice of catalogue here is, to a large extent, arbitrary and the methods discussed 
in this paper can be easily applied to the other catalogues or any combination thereof. 
Additionally, we compare the basis vectors obtained by PCA and Gram-Schmidt 
decomposition by applying them to the same set of waveforms. We then describe 
our observations of the waveform catalogue before discussing our observations. 

2. Methods 

2.1. Gram- Schmidt decomposition 

Gram-Schmidt (GS) decomposition is a recursive method for decomposing a set of 
waveforms to create a set of orthonormal basis vectors [15] . It was first applied to a 
supernova catalogue by Brady and Ray-Majumder [9]. For completeness, the main 
points of this method are reviewed below. 

In GS decomposition, one begins by selecting a waveform from the data set 
as the first basis vector. To create a second basis vector, the first basis vector is 
first projected onto the next waveform to be included into the set of basis vectors. 
The projected component is then subtracted from the second waveform and the 
resulting vector is orthogonal to the first basis vector. One continues this process 
by subtracting the sum of the projections of all exisiting basis vectors onto the 
desired waveform. This is done recursively until the desired number of waveforms 
are included into the set of basis vectors. More explicitly, for a set of waveforms, 
{i^i, H2, Hm}, the orthonormal basis vectors, {ei, 62, cm}, are 




(1) 



with 



i-l 



Hi = Hi- Y,{Hi,ej)ej 



(2) 
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and i = 1,..,M where M is the total number of waveforms. Here, the brackets 
denote an inner product. Exphcitly, the inner product for two vectors a and b, each 
of length n is given by 

n 

(a, 6) = '^Oibi (3) 

i=l 

where denotes the i^^ element of the vector a. 

Note that the second term in Equation [2] is the sum of the projection from all 
previously formed basis vectors. Therefore, Hi is the residual waveform not described 
by previously generated basis vectors. 

Brady and Majumder point out that the choice of the first waveform is chosen 
arbitrarily and may produce a basis vector set that spans the waveforms most 
efficiently. Therefore, the basis set is constructed repeatedly with a different 
initial waveform chosen each time until the basis vector that spans the waveforms' 
parameter with the fewest number of basis vectors is obtained. 



2.2. Principal Component Analysis 

In Principal Component Analysis (PCA), a basis set is formed by determining the 
eigenvectors of the covariance matrix of the desired data set. In the context of this 
article, let us arrange the waveforms from the catalogue {H} into a matrix H such 
that each column corresponds to one of the waveforms. Hi. For M waveforms, each 
of length A^, the matrix H has dimensions of A^ x M and the covariance matrix for 
H is calculated by 

C = -^HH-. (4) 

where C is the covariance matrix with dimensions N x N for waveforms with length 
A^. The normalised eigenvectors of C form a set of basis vectors, {ei, 62, cm}, 
that span the parameter space defined by the waveforms in H. Note that in PCA, 
the eigenvalues of the covariance matrix tell us how well each eigenvector spans the 
parameter space of the waveform catalogue. The eigenvectors are, therefore, ranked 
by their corresponding eigenvalues, with the first principal component having the 
largest eigenvalue. 

Supernovae waveforms have significant energies at high frequencies (~ IkHz), 
so A^ can be about 1000 data samples at LIGO (16384 Hz) data sampling rates or 
more at Virgo (20 kHz) sampling rates. Determining the eigenvectors of a matrix of 
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such dimensions is computationally expensive. A common method of avoiding this 
computationally intensive operation (see [16] for example) is to first calculate the 
eigenvectors, v, of H^H such that 



where A, is the corresponding eigenvalue for each eigenvector. Then, by pre- 
multiplying both sides by H, we have 



If we rewrite Equation H] so that the covariance matrix takes the form C = HH-^, 
then Hvj are the eigenvectors of the covariance matrix. So, for M <^ A^, we can 
determine the eigenvectors of covariance matrix by first calculating the eigenvectors 
of the smaller H^H which is an M x M matrix, thereby significantly reducing 
computation costs. 

3. Results 

3.1. The waveforms 

The waveform catalogue used here to demonstrate the use of PCA were produced by 
Dimmelmeier et al [13]. These waveforms are generated from axisymmetric general 
relativistic hydrodynamic simulations of stellar core-collapse to a proto-neutron star. 
They use the microphysics equation of state from Shen et al. [18] with a 20 Mq 
progenitor model from Woosley et al. [19]. There is a total of 54 waveforms in 
this catalogue generated by models which are parameterised by initial differential 
rotation, A, and the ratio of the rotational kinetic to gravitational energies, (3. 
The values of (3 are increased from 0.05% to 4% in 18 steps while three values of 
initial differential rotation were used. The three values, labelled A = 1, A = 2 and 
A = 3, corresponding to differential rotation occurring at 50000 km (almost uniform 
rotation), 1000km and 500km respectively. According to the rotation law [20] the 
angular velocity has dropped to 1/2 its central value at distance of A from the 
rotation axis. Hence, smaller values of A correspond to more differential rotation. 

I The method laid out here is similar to performing a singular value decomposition (SVD) |17| of 
the matrix H. In SVD, equations [5] and [6] arc the equivalent of using the right-singular vectors, 
which are the eigenvectors of H'^H, to determine the left-singular vectors, which are the eigenvalues 



(5) 



HH^Hv, = A.Hvi. 



(6) 



of HHT. 
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3.2. Comparing GS and PC A basis vectors 

We introduce the match parameter, /x, to quantify how well a set of basis vectors 
reconstructs a specified waveform. For a waveform, H^, fii is calculated by summing 
the projections of the desired number of basis vectors, Z, onto the waveform such 
that 



where Cj are the orthonormal basis vectors determined by the methods described in 
the previous section refbrady. As with equations [1] and [21 the brackets denote an 
inner product. If we normalise the set of waveforms, then /Xj will be equal to 1 if 
the sum of the projections of the basis vectors match at particular waveform. Hi, 
exactly. 

It is clear that fi will be equal to 1 for all waveforms in the catalogue if we use all 
basis vectors decomposed from the catalogue {Z = M). However, it is interesting to 
calculate the smallest match obtained for any waveform in the catalogue (commonly 
referred to as minimal match) if we use a subset of basis vectors. The minimal 
match, /imin, is often used in templated matched filter searches for signals with well- 
modelled waveforms. For such searches, the basis vectors form a bank of templates 
and minimal match is used to characterise how well the desired parameter space is 
covered by the template bank. To maximise the detection probability, one would 
maximise minimal match with the smallest number of templates so as to minimise 
computational time. 

Computing time, however, is not a serious issue for these waveforms because they 
are short and relatively few compared to, for example, the number of templates used 
in a search for gravitational waves from binary neutron stars (see [21] for a recent 
example). Instead, we examine the minimal match here to study the parameter space 
covered by the waveform catalogue. If the parameter space of the waveform catalogue 
is degenerate, then one would expect the minimal match to rapidly approach 1 for 
Z < M. 

Figure [1] shows the number of GS and PCA basis vectors required as a function 
minimal match. Similar number of GS and PCA basis vectors are required for 
minimal requirements up to about 0.9. The number of basis vectors required rises 
rapidly as the minimal match criterion approaches 1 since smaller features, unique to 
a small subset of waveforms, require a large number of basis vectors to reconstruct. 



z 



f^i= ^{Hi,ej)ej 



(7) 
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Figure 1. The number of PC A and GS basis vectors as a function of minimal 
match. For each value of minimal match, ^mm, we plot the number of basis 
vectors required so that /i > fimin for all waveforms in the catalogue. The number 
of basis vectors required for each value of minimal match is comparable for both 
methods of decomposition. 



It is interesting to note that for minimal match requirements greater than 0.95, 
more GS basis vectors are needed. Nonetheless, the parameter space spanned by the 
waveform catalogue is well spanned by less than half the total number of basis vectors 
for each method. This implies that all waveforms in the catalogue are dominated 
by a few unique features and this allows the minimal match to reach 0.75 with just 
7 basis vectors. In fact, Dimmelmeier et al. noted that these waveforms can be 
divided into three broad categories: waveforms due to a pressure-dominated bounce 
with convective overturn, waveforms due to a pressure-dominated bounce only and 
waveforms with a single centrifugal bounce [T3] . 
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In Figure [2], we plot an example waveform reconstructed by the GS and PGA 
basis vectors with a match of 0.9. The reconstructed waveform obtained by the two 
sets of basis vectors, though not identical, are very similar. The difference between 
the two waveforms is only about 10% in amplitude. 




Figure 2. The reconstructed waveform using PCA and GS basis vectors are 
plotted as black line in figures (a) and (b) respectively. The grey line is the original 
waveform {A = 2, P = 0.50) from the catalogue. A match of 0.95 was achieved using 
6 PCA basis vectors and 17 GS basis vectors. For comparison, both reconstructed 
waveforms are plotted in (c), where the grey line is the PCA reconstruction and 
the GS reconstruction represented by the black line. The difference between the 
two reconstructions is plotted in (d). 
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3.3. Using subset of the waveforms to form basis vectors 

In the previous subsection, we noted that the parameter space of the Dimmelmeier et 
al. waveform catalogue used in our studies can be spanned by a small number of basis 
vectors. This implies that there are many common features in the waveforms from 
this catalogue. Here, we chose to make basis vectors using only the 18 waveforms 
with moderate differential rotation at 1000 km from the centre {A = 2). We make this 
choice to test the hypothesis that the waveforms from precollapse stellar cores with 
moderate differential rotation contain features presented in waveforms from low and 
highly differentially rotating stellar bodies. Figure [3] plots the number of waveforms 
with A ^ 2 observed to have match greater than 0.7 and 0.9. With only 3 basis 
vectors, about 30 of the 36 waveforms already have a match greater than 0.7. With 
16 PCA or GS basis vectors, 67% of the remaining 36 waveforms have match greater 
than 0.9 and 94% have match greater than 0.7. Therefore, a large fraction of the 
parameter space covered by the catalogue is covered by waveforms from simulations 
with A = 2. This is consistent with the observations of Dimmelmeier et al [HI [22] 
who noted that the degree of differential rotation does not qualitatively alter the 
waveforms. 

4. Conclusions and discussion 

We have introduced PCA as a method of decompsing a set of waveforms into a 
set of basis vectors. A nice feature of PCA decomposition is that it allows one to 
quantitatively identify the main features in a desired set of waveforms since each basis 
vector is ranked by the value of its corresponding eigenvalue. One can interpret the 
basis vector with the largest corresponding eigenvalue (the first principal component) 
as having the most significant features in the waveform catalogue. 

We compared the PCA method introduced here to the GS decomposition method 
introduced by Brady and Ray-Majumder. The efficiency of the PCA basis vectors 
at spanning the parameter space defined by the waveform catalogue are comparable 
to GS decomposition, with about 15 basis vectors required for a minimal match of 
0.9. For a minimal match of 0.95, 17 PCA basis vectors while 22 GS basis vectors 
were required. This shows that there are many common features in the waveforms 
from the chosen catalogue. We also generated a set of basis vectors using only the 
18 of the 54 waveforms (with A = 2 only) using both methods and observed that 34 
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Figure 3. The number of waveforms with a match of at least 0.7 (dashed hues) 
and 0.9 (sohd hnes) as a function of the number of basis vectors. 



of the 36 waveforms not included in the construction of the basis set have a minimal 
match of 0.7. This implies that the features from all waveforms are well described 
by models with A = 2. 

The basis vectors produced here can easily be used by parameter estimation 
techniques. For example, Monte-Carlo Markov-Chain (MCMC) methods [23] can be 
applied to a detected gravitational wave signal with each basis vector as a degree 
of freedom to search across. The output of the MCMC analysis would be a set 
of coefficients that can be used to reconstruct the signal waveform from a linear 
combination of basis vectors. Alternatively, we can project the waveforms onto the 
basis vectors to determine a set of coefficients with which we can reconstruct each 
waveform with the basis vectors. Each waveform can then be parameterised by these 
coefficients or weights and they can be used to form a classification scheme similar 
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to that laid out by Turk and Pentland [23]. PCA as well as GS decomposition can 
also be used to decompose waveforms generated by simulations from different groups, 
using different core-collapse models. This application (also proposed by Brady and 
Ray-Majumder [9]) will combine the parameter space covered by all waveforms in 
an efficient manner. In the case of PCA, common features will be decomposed 
into the main components with large eigenvalues and, for parameter estimation, will 
reconstruct the main features of most waveforms. On the other hand, smaller features 
belonging to a small subset of waveforms will have much smaller eigenvalues and may 
be ignored by the analysis to reduce computation costs. 
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